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■ Abstract 

I It is known that any rational abstract numeration system is faithfully, and 

04 . effectively, represented by an N-rational series. A simple proof of this result is 

given which yields a representation of this series which in turn allows a simple 
computation of the value of words in this system and easy constructions for 
the recognition of recognisable sets of numbers. 

It is also shown that conversely it is decidable whether an N-rational series 
' corresponds to a rational abstract numeration system. 

_CJ ■ 

1 Introduction 

In order to state our result, we have first to recall tlie definition — due to Lecomte 
and Rigo [14] — of an abstract numeration system and, in order to motivate it, 
if^ • the more common one of numeration systems. 

QQ . Numbers do exist independently of the way we represent them, and operations 

I on numbers are defined independently of the way they are computed. The role of 

a numeration system is to set a framework in which numbers are represented by 
words (over a suitable alphabet) allowing to describe operations on numbers as 
algorithms on the representations, that is, on words. 
^ ' The most common numeration system — in our modern times — is the A;-ary 

. system where numbers are given their representation in base k, that is, written 

as words over the alphabet Af^. = {0, 1, . . . ,k — 1} and which do not start with 
(but for the representation of itself). The sequence of the representations of the 
integers in the binary system is: {0, 1, 10, 11, 100, 101, 110, . . .} . 

While keeping the notion of position numeration system, the k-ary systems can 
be generalised by replacing the sequence {k^)n^o with some increasing sequence 
U = {Un)n^o of integers such that Uq = 1 . Using a greedy algorithm, every 
integer n is then given a representation in the 'base' U, called its [/-representation 
and denoted by {n)u. A well-known example is the Fibonacci numeration system 
based on the sequence F = (Fn)n^o of Fibonacci numbers starting with Fq = 1 
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and Fi = 2 . In this system, every positive integer is given a canonical represen- 
tation which is computed by the greedy algorithm and which is characterised by 
the fact it does not contain 11 as a factor. The sequence of the representations of 
the integers in the Fibonacci system is: {0, 1, 10, 100, 101, 1000, 1001, 1010, . . .} . 

It is possible to look at these two numeration systems, the 2-ary system and 
the Fibonacci system, independently from the sequences (2"')„^o and (-F„)„^o 
and the greedy algorithm, and by just considering the set of words that represent 
the integers: 1{0,1}*U{0} in the first case, 1{0, 1}* \{0, 1}*11{0, 1}* U{0} in the 
second case and by enumerating the elem,ent of this set in the radix order. ""^ In both 
cases, every integer will be given the same representation without reference to the 
way this representation is computed. It is the language of all representations that 
matters and this naturally leads to the definition of abstract numeration systems. 

Definition 1 ([14]). An abstract numeration system (or ANS for short) is a 
triple S = (L, A, <) where A is an alphabet equipped with a total order < and L 

is an infinite language of A* . 

The system S allows to define a one-to-one correspondence between N and L 
by associating every integer n with the (n + l)-th word of L in the radix order 
defined on A* by <. This representation of n is denoted by (n) g and conversely 
the corresponding value of a word w of L is denoted by ■Kg{w). Of course, the 
following holds: 

{t^S (^))5 = ^ "''^d TTg {{n)s) = n . 

In most cases, the alphabet A and the order < on A are fixed and understood and we 
speak of the ANS defined by the language L and we use the simpler notations {n)j^ 
and TTj^ (w). 

If L is a rational language of A* , we say that the ANS is rational. 

Example 1. Let A = {a,b} , with a < b and let Li be the language of words 
with an even number of 6's: Li = {w € {a, 6}* | \w\h = mod 2} . The sequence 
of the representations of the integers is: {£,a,aa,bb,aaa,abb,bab,...} and, for 
instance, 

— O'O'bab and tt^^ {bbabb) = 29 . 

Beyond the irrepressible appeal to generalisation and abstraction, a true moti- 
vation that supports the definition of ANS is to understand which properties of a 
numeration system depend upon the whole language of the representations only, 
and which are more directly related to the way the representation of every number 
is computed. For instance, we have shown in a previous paper [1] that the succes- 
sor function in a rational ANS is a piecewise cosequential function, whereas the 
characterisation of those systems for which this successor function is co-sequential 
is known in the case of ^-numeration systems (c/. [9]) but seems to be out of reach 
for arbitrary rational ANS so far. 

'^The definition of radix order will be given below. 
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The purpose of this paper is to set up even tighter bonds between rational 
abstract numeration systems and classical automata theory. We reach this goal 
via the definition of the enumerating series of a numeration system and with the 
use of its representation in the case it is rational. 

Definition 2. Let S = (L, A, <) be an abstract numeration system. The enumer- 
ating series of S is the N-series over A* denoted by and defined by: 

^S = Y1 (^) + 
weL 

As above, the notation can be simplified as E^^ = YlweL i'^L (^) + ^) w . 

Remark 1. The above definition has been taken so that the language L is entirely 
determined by E^. Indeed, 

L = supp(E^) . 

One certainly could have taken E^ = X^^g^ (tt^ (w)) u; as a definition for E^. 

All the results we are going to describe would have been valid and it may have 
looked more natural. But wc would have lost the information on the first word 
of L, that is, the representation of 0. 

The starting point of our work is a direct proof of the following result (the 
definition of N-rational series will be recalled below). 

Theorem 1 ( [7] ) . The enumerating series of a rational abstract numeration sys- 
tem is an f^-rational series. 

In [7], Theorem 1 was a corollary of constructions set up for establishing the 
rationality or algebraicity of a family of counting problems by means of rational 
transductions. Theorem 1 was also given another and specialised proof in [16]. 
Even if both this and the original proofs are effective, the one we give below in 
Section 3 amounts to compute directly a representation of E^ from a represen- 
tation of (the characteristic series of) L and also to give an even more compact 
algorithm for calculating the coefficient of a word w in E^, that is, the value of w 
in the system L increased by 1. We then deduce from this latter algorithm the 
construction of the automaton that recognises the set of representations in the 
system L of a recognisable set of numbers (Section 4). It is to be noted that the 
same last construction was also given in [13] (c/. Remark 6). 

The next result plays the role of a converse of Theorem 1: of course not every 
N-rational series is the enumerating series of a rational number system, but one 
can at least know when it is the case. 

Theorem 2. It is decidable whether an N-rational series is the enumerating series 
of a (rational) abstract numeration system or not. 

We end the paper with some problems that are directly inspired by Theorem 1. 
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2 Preliminary and notation 



This paper makes use of several notions of automata theory such as unambiguous, 
or deterministic, automata, rational series and languages, with which the reader 
is supposed to be familiar. Definitions that are not given here are to be found in 
reference books such as [8, 3, 17]. Our notation are mainly those used in [17]. 

In the sequel, ^ is a finite alphabet. A* the free monoid generated by A, Ij^* 
the empty word, identity of A*. The length of a word w in A* is denoted by \w\. 
Let A be totally ordered by <. The radix order -< on A* is defined by:^ 

.„ f either \u\ < \v\ , 
u ~< V it < , , / 1 , 

[ or \u\ = \v\ , u = wau , v = wov and a < o . 

The radix order is a well order, that is, every non empty subset of A* has a smallest 
element for -< and can thus be used to enumerate any subset of A*. 

Let IK be a semiring; for instance, N, the semiring of non negative integers. 
A (K-) series s (over ^4*) is a map from A* to K, and the image of a word w 
by s is called the coefficient of w in s and is denoted by <s, w>. The set of series 
over A* with coefficients in K is denoted by K{{A*)). The support of a series s is 
the language, denoted by supp ,s, which contains those words whose cocfRcient in s 
is different from Ok- Conversely, the characteristic series of a language L of A* 
is the N-series, denoted by L, defined by <L, ■«;> = 1 if is in L and <L, w> = 
otherwise. 

We call (M.-) representation, of dimension n, a triple (A, /i, i') where /i is a mor- 
phism ij,: A* ^ K"^*^ from A* to the n x n-matrices with entries in K, and A 
and u are two vectors of dimension n with entries in K, A a row vector and u a 
column vector. A series s in IK((v4*)) is {K-)recognisable if there exists a represen- 
tation (A, fi, u) such that, for every w m. A* , 

<s, wy = X ■ fj, (w) ■ v . 

A series is (K-)rational if it is the behaviour of a finite (K-) automaton, that 
is, an automaton with multiplicity in K (the behaviour of an automaton A is 
the series where the coefficient of a word w is the sum of the multiplicities of all 
computations in A with label w) . Finite K-automata whose transitions are labelled 
by letters and K-representations are two ways to describe the same concept.^ The 
illustration given with Example 1 suffices for the definition. As every K-automaton 
is equivalent to one which is labelled by letters, the families of K-rational and K- 
recognisable series coincide. 

Example 2. The language Li of words with an even number of 6's is recognised 
by the automaton Ai drawn at Figure 1. The representation (Ai, /xi, ui) associated 
with Ai is 

A, = (10), = J) , ,m = (J ;) . = (;) . 

^Notice that -< is not reflexive and is not the order but the strict part of the radix order. 
^This is true only because A* is a free monoid. 
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In the sequel, we mostly use N as the semiring, and we may call representation 
an N-representation. 




Figure 1: A DFA accepting words with an even number of 6's 



3 Representation of the enumerating series 

The proof of Theorem 1, as given in [7] where Theorem 1 is (part of) Corollary 8, is 
based on the construction of an unambiguous rational transduction that associates 
to every word u all words v that are greater than u in the radix order. From this, it 
is easy to derive that the image of the characteristic series of a rational language L 
is a recognisable series, and equal to — up to the intersection (or Hadamard 
product) with L. The advantage of this construction is that it can be applied 
to unambiguous context-free languages and to various other counting functions 
as well. The inconvenient is that it does not provide directly the representation 
of E^, although it is very similar to the one we develop below. 

In [16], Theorem 1 is Proposition 29; its proof is more direct than in [7] in 
the sense it does not rely on the rational transduction machinery but makes use 
instead of the characterisation of recognisable series as those which belong to a 
finitely generated stable submodule of But this proof yields neither the 

representation of E^ nor a simple mean to compute it. 

3.1 Preparation 

If a is a letter of A, let us denote by Aa the set of letters of A smaller than o: 

Aa = {be A\b<a} . 

If u be a word of A*, let us denote by P{u) the set of words of A* (strictly) smaller 
than u in the radix order: 

P(n) = {v £ A* \v ^u} . 

This set P{u) can be defined by induction on the length of u by the following 
remark. Any word smaller than u followed by any letter is smaller than ua, and 
so is u 6 for any letter b smaller than a, and the empty word is also smaller than u a. 
These three sets are pairwise disjoint and any word smaller than ua falls in one 
of them. Altogether, we have proved the following lemma:^ 

Lemmas. W e A* , e A P{ua) = 1a* U uAaU P{u)A . 
*cf. Remark 6 below. 
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Let L be a rational language of A* and (A, fi, u) the N-representation which 
corresponds to an unambiguous finite automaton which recognises L: 

"iw e A* \- ijl{w) -v = 1 w e L . 

Wc use the following notation: if X is a (finite) subset of A*, then fJ-{K) = 
^weK A* (^) • -^^ (-^' ^) corresponds to an unambiguous automaton, we have: 

yKQA* X-ii{K)-u=J2^-t^H-'^ = card {K n L) . (1) 

weK 

3.2 Proof of Theorem 1 

Let S = {L,A, <) be a rational ANS, A an unambiguous automaton that recog- 
nises L, (A, /i, I/) the corresponding N-representation, and k its dimension. Prom (1) 
follows: 

We A* X- II {P{u)) ■ V = card {{v e A* \ v eL and v <u}) . 
and thus: 

Vu; G L A • /X {P{w)) ■ v = TTj^ {w) . 
Prom Lemma 3 follows:^ 

Vu G A* , Va e A 

A • 11 {P{ua)) ■ V = X ■ n {\-A*) ■ v + X ■ iJi{u) ■ i-L {Aa) ■ v 

-rX-ii{P{u))-ii{A).v . (2) 

Let a = \i (^4) and, for every a in ^, cTa = (^a)- Thus (2) is rewritten as: 
Vn G yl* , Va G 

A • [i {P{ua)) ■ v = X ■ v + X ■ fi (n) • Ua • + A • /x (-P(n)) • cr • . (3) 

Let {r], K, Q be the representation of dimension 2k + 1 described by the following 
(1, k, A;)-block decomposition: 

/I A\ /O 

77 = (1 A 0) , yaeA K (a) = (a) da , C = 

\0 cr / \iy 

It is routine to verify, by induction on the length of u, and based on Lemma 3, 
that A • /J, {P{u)) ■ v = rj ■ K (u) ■ ( for every u in A*. 

('\ 

Let now ^ = I and let s be the series realised by {rj, k, $,) : 

V-u G A* <s, u> = 1 -I- card {{v e A* \ v e L and v -< u}) . 
^cf. Remark 6 below. 
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In order to get the enumerating series of L, we must retain the words that belong 
to L only, that is, to make the Hadamard product with the characteristic series L 
of L: 

Eg = SQL , 

and is N-rational as the Hadamard product of two N-rational series (this is 
often referred to as (another) Schiitzenberger Theorem^). □ 

Remark 2. The construction underlying the proof yields for an N-representation 
of dimension 2k'^ + k. 

3.3 Computation of the value of a word 

The description of a N-rational series s by a N-representation gives a way to 
compute the coefficient of any word w m s. Theorem 1 thus solves ipso facto the 
problem of computing the value tt^ [tv) of a word w in a rational abstract number 
system L (which occupies the whole Sect. 2 in [15]). 

If s has a representation (x,a;,0) of dimension n, and if w is of length ^, the 
general algorithm consists in computing x " "^(^i+i) = (x " ^(wi)) ■ a;(ai+i) for 
i = Qtoi = l— 1, where Oj is the i-th letter of w and Wi its prefix of length i. 
Every step costs 2n^ operations, thus in total, roughly 2lv? operations. 

It would be not such a good idea, however, to apply this general algorithm to 
the representation of dimension 2k'^+k we have obtained in the proof of Theorem 1 
above. Its particular form allows, in fact, to compute with vectors and matrix of 
dimension k only. 

Given as above the unambiguous automaton A of dimension k which recog- 
nises L and the corresponding N-representation we associate a pair 
(^a{w), 7(tf^)) with every w in A*, where a{w) and j{w) axe two (row) vectors of di- 
mension k, a{w) with entries in {0, 1}, 'y{w) with entries in N. The pair (a(ttj), 7(11;)) 
is computed by induction on the length of w in the following way. Let £ be the 
length of w, let 

a{lA*) = X, P{Ia*) = >^, and 7(1^*) = , 
and, for every ^ i < let 

a{wi+i) = a{wi) ■ ji (oj+i) , I3{wi+i) = a{wi) ■ aai^^ , 

and -i{wi+i) = \ + l3{wi+i) + -^{wi)-a . 

All a{w) have entries in {0, 1} since A is unambiguous. As a simple reformula- 
tion of the preceding subsection, we have vr^ {w) = '^{w)-v if a{w)-v = 1 , that is, 
if w is recognised by A and thus in L, vr^^ {w) undefined otherwise. This algorithm, 
that is, the computation of {a{w),^{w)) , costs roughly Qlk"^ operations. 

^c/. [3, Th. 1.5.3], [8, Prop. VI.7.1] or [17, Cor. III.3.9]. 
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Example 3. Let us consider again the language Li and the DFA Ai of Figure 1. 
We have thus: 

The computation of tt^^ (bbabb) for instance takes the following steps. 



i 


ai 


ai 






i 




ai 


A 









(1,0) 


(1,0) 


(0,0) 


3 


a 


(1,0) 


(0,0) 


(7,6) 


1 


b 


(0,1) 


(1,0) 


(2,0) 


4 


b 


(0,1) 


(1,0) 


(15,13) 


2 


b 


(1,0) 


(0,1) 


(3,3) 


5 


b 


(1,0) 


(0,1) 


(29,29) 



And finally tt^^ (66a66) = (29, 29) • i^i = 29. 

Remark 3. The computation of (^a{w),^{w)^ is very similar to the construction 
called product of an automaton by a skew action in [18, 19]. 

4 Representation of recognisable subsets of numbers 

If s is an N-rational series, that is, a map s : yl* — )■ N , it is well known that 
for any recognisable set of numbers X, s^^(X) is a rational set of A* (see [3, 
Corol. III.2.4], [8, Th. VI.10.1] or [17, Corol. 111,4,21], for instance). Theorem 1 
thus directly implies the following statement, which has also been proved without 
reference to it in [14] and in [13]. 

Corollary 4 ([14]). A recognisable set of numbers is L -recognisable in any rational 
abstract numeration system L. 

If Corollary 4 requires formally no proof after the characterisation of rational 
abstract numeration systems given by Theorem 1, it is interesting to further in- 
vestigate the construction which, given L and a recognisable set of numbers X 
computes an automaton which recognises the set {X)j^. The computation method 
used in the preceding section (which is not the mere application of the general 
result that yields Corollary 4) allows to establish easily the following statement. 

Proposition 5. Let L be a rational language over A* recognised by a deterministic 

automaton of dimension k. For any integers p and r < p, let Xp^r = + 
be the set of integers congruent to r modulo p. Then the language {Xp^r)^ of 
representations of numbers in Xp^r is recognised by a deterministic automaton 
with at most kp^ states. 

Proof. Let A be an automaton, with set of states Q of cardinal fc, which recog- 
nises L and (A, u) the corresponding N-representation. If A is deterministic, 
then A and /x are row monomial and so are all a{w), for w in A*, which are thus 
in 1-1 correspondence with the elements of Q. 



8 



Let C be the automaton whose set of states is 

R = {[a{w),6{w)^ \ w ^ A*} where S{w) = ^{w) mod p . 
Thus, R C. Q X (Z/pZ)^ . The transitions of C are defined by, for every a in A: 
Vw G A* , ya E A (^a{w),5{w)) (^a{wa),5{wa)) . 

The initial state of C is (A, 0) and its final states are those (^a{w),S{w)) where a{w) 
is final in A and d{w) ■ u = r mod p . It then follows that the language accepted 
hy C is {Xp,r)L- □ 

Remark 4. If we start from an unambiguous automaton A of dimension k, the 
same method yields a deterministic automaton C with at most 2^p'' states. 

Example 4. The automaton built in this way from Ai and for the recognisable 
set of numbers 3 N + 1 is the automaton Ci shown at Figure 2 (this automaton is 
not minimal; its minimal quotient has only 8 states). 




Figure 2: A DFA recognising the set 3N + 1 in the ANS Li. 



Remark 5. In [15], another construction has been given for the same purpose. 
The automaton V built with this other method and which recognises {Xp^r)L is 
not deterministic, but codeterministic and has, roughly, kp^~^^ states. Since T> is 
codeterministic, its determinisation yields the minimal automaton of {Xp^r)L 
thus, thanks to Proposition 5, does not produce an exponential blow-up. We do 
not know of a direct proof of this fact. 

Remark 6. After the submitted version was written (and sent), we have learned 
of the reference [13]. Not only Corollary 4 is established there, but with a method 
of proof which is very similar to ours. Our Lemma 3 is Lemma 1 in [13]. The 
term representation is not used there but the matrices ^(a), aa and a are defined 
(under other notation) and used to give the same proof of Equation (2) (Lemma 2 
in [13]). 

Afterwards, [13] develops in another direction than this paper: it proves lower 
bounds for the state complexity of {Xp^,.) l shows that the property corre- 
sponding to Corollary 4 does not hold for context-free languages. 

Remark 7. If the numeration system considered is a positional numeration system 
(and still a rational one), and under some supplementary hypotheses, then the 
exact number of states for the minimal automaton of {Xp ^) ^ can be computed 
(c/. [6]). 
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5 Proof of Theorem 2 



The image of a rational language by a rational relation (or transduction) is a 
rational language; this classical result, due to Nivat and called Evaluation Theorem 
in [8], extends to rational series, as we state now (c/. [17]). 

Proposition 6. Let ip: A* ^ B* he an unambiguous rational relation and s a 
"K-rational series over A* . Then the series 

^{s)= J2 <s,w>(p{w)= ^ <s,ip~^{u)>u , (4) 
weA* ueB* 

if it is defined, is a K.-rational series over B* . 

It is this result that was used in [7] for the proof of Theorem 1. 

Proof of Theorem 2. Let s in NRat^* and L = supp s in Rat A*. The set L is 
totally ordered by the radix order 

L = {wq < Wi < W2 < ■ ■ ■ < Wn < ■ ■ ■} 

and Succl is the function from A* into itself whose domain is L and which maps 
every Wi to Wi+i. It is well-known that Succ^ is a rational function ([4, 10]) and 
hence unambiguous ([8, 17]). It then follows that the series 

i=oo 

SuCCl (5) = ^ <S,Wi>Wi+l 
1=0 

is an N-rational series and t = s — Succl(s) is a Z-rational series. Now, s is the 
enumerating series of the abstract numeration system L if, and only if, for 

every positive integer i, <J:,Wi> = 1 , that is, if, and only if, t — L \ {wq} = , 
a condition which is known to be decidable as Z is a sub(semi)ring of a field {of. 
[8, 17]). □ 

6 Problems and future work 

Looking at abstract number systems as N-rational series naturally leads to two 
families of questions. The first family consists in questions on N-rational series 
which ask to which extent the series is related to abstract number systems; the 
second in questions which generalise to N-rational series questions that are usually 
considered for (abstract) numeration systems. 

An example of questions in the first family is to ask if it is decidable whether 
a given N-rational series is the enumeration (in a radix ordering) of its (rational) 
support in a certain, and unknown, abstract numeration system. This seems to 
be rather a difficult problem. An obvious necessary condition for a series to be 
a positive instance of this problem is itself a non trivial problem that can be 
formulated in the following way. 
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Conjecture 7. It is decidable whether an N-rational series is a monotone in- 
creasing function (for a given order of letters). 

A result due to Honkala [11] provides a kind of converse of Corollary 4 in 
the case of p-ary numeration systems and states that it decidable whether a p- 
recognisable set of numbers is recognisable. The generalisation of this result to 
larger class of numeration systems has been recently studied in [2, 5]. Its gener- 
alisation to abstract number systems has been stated as a problem in [12]. It is 
also a typical example of a question in the second family. 

Conjecture 8. It is decidable whether the set of coefficients of an N-rational 
series is a recognisable set of numbers. 

7 Summary 

In this short paper, we have presented a new idea for the study of abstract number 
systems, which brings to the subject the whole power of weighted automata theory. 
In return, the subject of abstract number systems naturally opens new questions 
for the theory of N-rational series. 
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